46 research outputs found

    Action Recognition in Video Using Sparse Coding and Relative Features

    Full text link
    This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by decomposing each video into a reduced set of representative atomic action acts or key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational Act Descriptor, that exploits the power of comparative reasoning to capture relative similarity relations among key-sequences. In terms of the method to obtain key-sequences, we introduce a loss function that, for each video, leads to the identification of a sparse set of representative key-frames capturing both, relevant particularities arising in the input video, as well as relevant generalities arising in the complete class collection. In terms of the method to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative intra and inter-class similarities among local temporal patterns arising in the videos. The resulting ITRA descriptor demonstrates to be highly effective to discriminate among action categories. As a result, the proposed approach reaches remarkable action recognition performance on several popular benchmark datasets, outperforming alternative state-of-the-art techniques by a large margin.Comment: Accepted to CVPR 201

    Comparing Neural and Attractiveness-based Visual Features for Artwork Recommendation

    Full text link
    Advances in image processing and computer vision in the latest years have brought about the use of visual features in artwork recommendation. Recent works have shown that visual features obtained from pre-trained deep neural networks (DNNs) perform very well for recommending digital art. Other recent works have shown that explicit visual features (EVF) based on attractiveness can perform well in preference prediction tasks, but no previous work has compared DNN features versus specific attractiveness-based visual features (e.g. brightness, texture) in terms of recommendation performance. In this work, we study and compare the performance of DNN and EVF features for the purpose of physical artwork recommendation using transactional data from UGallery, an online store of physical paintings. In addition, we perform an exploratory analysis to understand if DNN embedded features have some relation with certain EVF. Our results show that DNN features outperform EVF, that certain EVF features are more suited for physical artwork recommendation and, finally, we show evidence that certain neurons in the DNN might be partially encoding visual features such as brightness, providing an opportunity for explaining recommendations based on visual neural models.Comment: DLRS 2017 workshop, co-located at RecSys 201

    An Efficient Point-Matching Method Based on Multiple Geometrical Hypotheses

    Get PDF
    Point matching in multiple images is an open problem in computer vision because of the numerous geometric transformations and photometric conditions that a pixel or point might exhibit in the set of images. Over the last two decades, different techniques have been proposed to address this problem. The most relevant are those that explore the analysis of invariant features. Nonetheless, their main limitation is that invariant analysis all alone cannot reduce false alarms. This paper introduces an efficient point-matching method for two and three views, based on the combined use of two techniques: (1) the correspondence analysis extracted from the similarity of invariant features and (2) the integration of multiple partial solutions obtained from 2D and 3D geometry. The main strength and novelty of this method is the determination of the point-to-point geometric correspondence through the intersection of multiple geometrical hypotheses weighted by the maximum likelihood estimation sample consensus (MLESAC) algorithm. The proposal not only extends the methods based on invariant descriptors but also generalizes the correspondence problem to a perspective projection model in multiple views. The developed method has been evaluated on three types of image sequences: outdoor, indoor, and industrial. Our developed strategy discards most of the wrong matches and achieves remarkable F-scores of 97%, 87%, and 97% for the outdoor, indoor, and industrial sequences, respectively

    Our Deep CNN Face Matchers Have Developed Achromatopsia

    Full text link
    Modern deep CNN face matchers are trained on datasets containing color images. We show that such matchers achieve essentially the same accuracy on the grayscale or the color version of a set of test images. We then consider possible causes for deep CNN face matchers ``not seeing color''. Popular web-scraped face datasets actually have 30 to 60\% of their identities with one or more grayscale images. We analyze whether this grayscale element in the training set impacts the accuracy achieved, and conclude that it does not. Further, we show that even with a 100\% grayscale training set, comparable accuracy is achieved on color or grayscale test images. Then we show that the skin region of an individual's images in a web-scraped training set exhibit significant variation in their mapping to color space. This suggests that color, at least for web-scraped, in-the-wild face datasets, carries limited identity-related information for training state-of-the-art matchers. Finally, we verify that comparable accuracy is achieved from training using single-channel grayscale images, implying that a larger dataset can be used within the same memory limit, with a less computationally intensive early layer

    The impact of MEG source reconstruction method on source-space connectivity estimation: A comparison between minimum-norm solution and beamforming.

    Get PDF
    Despite numerous important contributions, the investigation of brain connectivity with magnetoencephalography (MEG) still faces multiple challenges. One critical aspect of source-level connectivity, largely overlooked in the literature, is the putative effect of the choice of the inverse method on the subsequent cortico-cortical coupling analysis. We set out to investigate the impact of three inverse methods on source coherence detection using simulated MEG data. To this end, thousands of randomly located pairs of sources were created. Several parameters were manipulated, including inter- and intra-source correlation strength, source size and spatial configuration. The simulated pairs of sources were then used to generate sensor-level MEG measurements at varying signal-to-noise ratios (SNR). Next, the source level power and coherence maps were calculated using three methods (a) L2-Minimum-Norm Estimate (MNE), (b) Linearly Constrained Minimum Variance (LCMV) beamforming, and (c) Dynamic Imaging of Coherent Sources (DICS) beamforming. The performances of the methods were evaluated using Receiver Operating Characteristic (ROC) curves. The results indicate that beamformers perform better than MNE for coherence reconstructions if the interacting cortical sources consist of point-like sources. On the other hand, MNE provides better connectivity estimation than beamformers, if the interacting sources are simulated as extended cortical patches, where each patch consists of dipoles with identical time series (high intra-patch coherence). However, the performance of the beamformers for interacting patches improves substantially if each patch of active cortex is simulated with only partly coherent time series (partial intra-patch coherence). These results demonstrate that the choice of the inverse method impacts the results of MEG source-space coherence analysis, and that the optimal choice of the inverse solution depends on the spatial and synchronization profile of the interacting cortical sources. The insights revealed here can guide method selection and help improve data interpretation regarding MEG connectivity estimation
    corecore